A Neural Probabilistic Language Model
https://www.jmlr.org/papers/volume3/bengio03a/bengio03a.pdf
2003
Abstract
Traditional but very successful approaches based on n-grams obtain generalization by concatenating very short overlapping sequences seen in the training set.
We propose to fight the curse of dimensionality by learning a distributed representation for words which allows each training sentence to inform the model about an exponential number of semantically neighboring sentences.
次元の呪いを解決したい